Most Probable Explanations for Probabilistic Database Queries (Extended Abstract)
نویسندگان
چکیده
Probabilistic databases (PDBs) have been widely studied in the literature, as they form the foundations of large-scale probabilistic knowledge bases like NELL and Google’s Knowledge Vault. In particular, probabilistic query evaluation has been investigated intensively as a central inference mechanism. However, despite its power, query evaluation alone cannot extract all the relevant information expressed in PDBs. Inspired by the maximal posterior probability computations in Probabilistic Graphical Models (PGMs) [3], we investigate the problem of finding most probable explanations for database queries to exploit the potential of such large databases to their full extent. The most probable database [2] is the (classical) database with the largest probability that satisfies a given query. Intuitively, the query defines constraints on the data, and the goal is to find the most probable database that satisfies these constraints. We also introduce a more intricate notion, called most probable hypothesis, which is only a partial database satisfying the query. The most probable hypothesis contains only facts that contribute to the satisfaction of the query, which allows to more precisely pinpoint the most probable explanations for the query. We study the complexity of the corresponding decision problems for a variety of database query languages. In particular, we also consider ontology-mediated queries (OMQs), which enrich UCQs with the power of Datalog± ontologies. They allow us to query PDBs in a more advanced manner [1]. We show that the complexity of these problems changes significantly with the ontology languages and the complexity-theoretic assumptions. Our results provide tight complexity bounds for a multitude of Datalog± languages (which cover some Horn Description Logics).
منابع مشابه
Most Probable Explanations for Probabilistic Database
Probabilistic databases (PDBs) have been widely studied in the literature, as they form the foundations of large-scale probabilistic knowledge bases like NELL and Google’s Knowledge Vault. In particular, probabilistic query evaluation has been investigated intensively as a central inference mechanism. However, despite its power, query evaluation alone cannot extract all the relevant information...
متن کاملMost Probable Explanations for Probabilistic Database Queries
Forming the foundations of large-scale knowledge bases, probabilistic databases have been widely studied in the literature. In particular, probabilistic query evaluation has been investigated intensively as a central inference mechanism. However, despite its power, query evaluation alone cannot extract all the relevant information encompassed in large-scale knowledge bases. To exploit this pote...
متن کاملThe Most Probable Database Problem
This paper proposes a novel inference task for probabilistic databases: the most probable database (MPD) problem. The MPD is the most probable deterministic database where a given query or constraint is true. We highlight two distinctive applications, in database repair of key and dependency constraints, and in finding most probable explanations in statistical relational learning. The MPD probl...
متن کاملSensitivity Analysis and Explanations for Robust Query Evaluation in Probabilistic Databases
Probabilistic database systems have successfully established themselves as a tool for managing uncertain data. However, much of the research in this area has focused on efficient query evaluation and has largely ignored two key issues that commonly arise in uncertain data management: First, how to provide explanations for query results, e.g., “Why is this tuple in my result ?” or “Why does this...
متن کاملAn Overview on Querying and Learning in Temporal Probabilistic Databases
Probabilistic databases store, query and manage large amounts of uncertain information in an efficient way. This paper summarizes my thesis which advances the state-of-the-art in probabilistic databases in three different ways: First, we present a closed and complete data model for temporal probabilistic databases. Queries are posed via temporal deduction rules which induce lineage formulas cap...
متن کامل